NLP Terminology Glossary

DRAFT — Internal Review Only · Not for distribution in wide release

This document is a working draft intended for internal review by the Enterprise OMOP implementation team and our growing network of collaborators. It has not been finalized, peer-reviewed, or approved for external distribution.

See releases and roadmap for details.

Context

See Notes and NLP Design for the problem overview and Architecture for the full schema specification.

Terms

Term	Definition	Source
Assertion	A contextual attribute of an extracted entity span — negation, experiencer, temporality, certainty. The standard annotation task in clinical NLP.	i2b2 2010, n2c2
Context	The algorithm and process for detecting assertions. Named after the ConText algorithm.	Harkema et al. (2009)
Modifier	Implementation-level term for assertion attributes in NLP frameworks.	medspaCy, cTAKES, OMOP
Span	A contiguous segment of text identified by an NLP system, with character-level offsets.	OHDSI NLP WG
Entity	A clinically meaningful concept identified within a span (e.g., a condition, drug, measurement).	General NLP
Negation	An assertion that a clinical finding is absent. e.g., "denies chest pain" — chest pain is negated.	i2b2 assertion categories
Experiencer	An assertion about who has the finding — patient or someone else (e.g., family member).	ConText (Harkema et al.)
Temporality	An assertion about when a finding occurred — current, historical, or hypothetical.	ConText (Harkema et al.)
Certainty	An assertion about the confidence level — definite, possible, conditional, hypothetical.	i2b2 assertion categories
Source Provenance	The `source` / `source_uri` / `source_version` triple that records where an NLP system, pipeline, or component originates — platform (e.g., GitHub, PyPI, HuggingFace), addressable location, and machine-verifiable version (git SHA, package version, Docker tag).	Emory Enterprise OMOP
`_DERIVED` table	An OMOP CDM table suffixed with `_DERIVED` that contains NLP-extracted data, structurally separated from discrete EHR data.	Emory Enterprise OMOP

References

Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association. 2011;18(5):552–556. PMC3168320
Harkema H, Dowling JN, Thornblade T, Chapman WW. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics. 2009;42(5):839–851. PMC2757457
n2c2 NLP Clinical Challenges — successor to i2b2, continuing shared tasks on clinical NLP including context classification. n2c2.dbmi.hms.harvard.edu
OHDSI NLP Working Group — community-driven standards for NLP output representation in the OMOP CDM. ohdsi.org

Notes and NLP Design — The problem and why NLP metadata matters
Architecture — The 4-layer schema specification
Entity Relationship Diagram — Visual schema reference

NLP Terminology Glossary

Terms

References

Related Pages